Trainging mod implementation (WIP) by EvgeniiMekhanik · Pull Request #2663 · tempesta-tech/tempesta

EvgeniiMekhanik · 2026-06-09T19:11:11Z

No description provided.

const-t

I see that PR is WIP, but I have few comments for the future.

const-t · 2026-06-12T14:17:27Z

+	 */
+	if (likely(!tfw_mode_is_disabled())) {
+		s = rcu_dereference(g_stats);
+		percpu_counter_add(&s->sum, delta1);


What a reason to use percpu_counter instead of simple per-cpu var? percpu_counter pretty large and has overhead, must be a reason to use it.

const-t · 2026-06-15T14:50:28Z

@@ -0,0 +1,181 @@
+/**


I suggest renaming this to adaptive_limits.c or similar and use word "training" only in sense of "training mode" as the state of the adaptive limits.

const-t · 2026-06-15T14:52:04Z

+	atomic_long_t		max;
+	s64 	__percpu	*counter;
+	u16			epoch;
+} TfwClientCounter;


From my point of view we should move this to training.h. All other related structs as well

const-t · 2026-06-16T09:22:09Z

+}
+
+static bool
+tfw_client_counter_training_check(TfwClientCounter *counter,


It seems client.c not the right place for this function. I would prefer to have it in training.c

const-t · 2026-06-16T09:33:17Z

+		return defence(curr);
+
+	if (tfw_client_counter_change_max(counter, curr, &delta1, &delta2))
+		adjust_num(delta1, delta2);


I would suggest moving update of the global stats to the tfw_http_conn_recv_finish(), we don't need live update of the counter during training

Introduce helper functions for 128-bit arithmetic that are not provided by the Linux kernel: - 128/32 division using bitwise long division; - integer square root using binary search. The library is required for training mode statistics collection, where aggregating metrics across a large number of clients can overflow 64-bit intermediate values. An evaluation comparing the sum/sumsq and Welford algorithms using both 64-bit and 128-bit arithmetic showed that 64-bit implementations become inaccurate for workloads with approximately 100,000 or more clients due to intermediate overflows, while both 128-bit implementations match the exact results across all tested workloads Accuracy results: client maximum increases +1 on each iteration (same as expected for connection tracking): exact = 8.33e+08 sum/sumsq (128-bit) = 8.33e+08 Welford (128-bit) = 8.33e+08 sum/sumsq (64-bit) = 8.33e+08 Welford (64-bit) = 32.4295 client maximum randomly increases in a range (1 - 10) on each iteration (possible for non-idempotent request tracking): exact = 2.53805e+10 sum/sumsq (128-bit) = 2.53805e+10 Welford (128-bit) = 2.53805e+10 sum/sumsq (64-bit) = -2.95145e+15 Welford (64-bit) = 32.43 client maximum randomly increases in a range (1 - 100) on each iteration: exact = 2.12403e+12 sum/sumsq (128-bit) = 2.12403e+12 Welford (128-bit) = 2.12403e+12 sum/sumsq (64-bit) = -2.52534e+17 Welford (64-bit) = 32.4224 client maximum randomly increases in a range (1 - 1000) on each iteration (possible for memory usage tracking, since we are planning to track memory usage in pages): exact = 2.08852e+14 sum/sumsq (128-bit) = 2.08852e+14 Welford (128-bit) = 2.08852e+14 sum/sumsq (64-bit) = -2.47926e+19 Welford (64-bit) = 32.419 Part-of: training/defence mode implementation Issue: #1346

Add a generic training/defence subsystem used to detect abnormal behavior based on z-score statistics. The implementation provides: - training mode: collect per-event statistics (sum, sumsq, count) using percpu counters to minimize contention; - defence mode: evaluate incoming values against calculated mean/std and reject anomalies exceeding configured z-score threshold (drop connection with TCP RST); Use adaptive limits (training/defence) library with per-client connection tracking. Maintain current and maximum number of concurrent connections per client and update statistic on each new maximum of concurrent client connections. In defence mode calculate z-score for the client on each new established connection and drop connection if z-score exceeded configured threshold. The classical Welford algorithm was evaluated but found unsuitable for this workload. In its original form Welford assumes an append-only stream of samples, where each new observation increases the sample count. In our case, "n" represents the number of clients rather than the number of events. For each client we continuously update the current maximum number of connections/requests/memory/cpu usage. When a value changes, the previous sample must be removed from the aggregated statistics before the updated value is inserted. This requires a replace/update operation rather than append-only updates, which implies a reversible variant of Welford’s algorithm and significantly increases implementation complexity. We therefore use a sum/sumsq based approach. Although sum/sumsq is generally considered less numerically stable than Welford’s algorithm due to potential catastrophic cancellation when subtracting large nearly equal values, this is not a concern in our case. For the expected value ranges in production workloads, such pathological distributions (e.g. values clustered around 1e9 with variance ≈ 1) are not realistic, and numerical precision remains sufficient. Part-of: training/defence mode implementation Issue: #1346

Use the adaptive limits framework to track per-client in-flight non-idempotent requests, since only such requests occupy upstream connections and therefore are suitable for overload detection. Introduce `TfwAdaptiveLimitLock`, a generic adaptive limit structure with a per-CPU counter, per-epoch maximum tracking, and synchronization for training epoch transitions. Extend the adaptive limits library with helpers for request accounting and z-score calculation, reusing the existing logic. Tracking of in-flight non-idempotent requests is performed in two stages: - We account non-idempotent requests in the HTTP layer by incrementing the counter when a non-idempotent request is queued and decrementing it once the request completes. On this stage the current request count is updated using per-CPU counters without acquiring any locks. - The second stage occurs in the `on_rcv_finish` callback at the end of `ss_tcp_process_data`. At this point, the current number of in-flight requests is obtained by aggregating all per-CPU counters. If the aggregated value exceeds the previously recorded maximum, the maximum is updated atomically and the corresponding deltas are applied to the global `sum` and `sumsq` statistics. This agregated value is also used in defence mode for z-score calculation and deciding whether the client should be blocked. This approach avoids expensive synchronization on every request while still maintaining accurate client maxima for statistical analysis. Part-of: training/defence mode implementation Issue: #1346

Add per-socket training_epoch field to track the training generation for connection-related statistics. This allows associating socket events with a specific training period and prevents mixing measurements across training epochs when switching between TRAINING and DEFENCE modes.

Extend the adaptive limits framework to track per-client CPU usage during request/responce processing and use it as an additional overload detection metric. Introduce a CPU adaptive limit based on `TfwAdaptiveLimitLock` and integrate it into the existing training and defence infrastructure. Unlike request tracking, CPU usage is accumulated using an exponential moving average (EMA), which provides a stable estimate of client CPU consumption without introducing synchronization overhead. (A simple counter would grow monotonically throughout the lifetime of a client, making it unsuitable for anomaly detection. The EMA provides a bounded and continuously adapting estimate of recent CPU activity). CPU usage is tracked in two places: - Measure processing time by recording CPU cycles at the beginning of `ss_tcp_process_data()` and calculating the elapsed time in the `conn_recv_finish` callback after all received data has been processed. The measured delta is used to update the client's CPU usage statistics. (This is a primary accounting path). - CPU usage is also accounted during response processing in `tfw_http_msg_process_generic`. In this case, CPU cycles are measured at the function entry and exit. During training, aggregate per-CPU EMA values, update the recorded maximum CPU usage, and adjust the global statistical model. During defence mode, calculate the client's CPU usage z-score and drop the connection when it exceeds the configured threshold. Reuse the existing adaptive limits infrastructure and IP blocking mechanism for enforcement. Part-of: training/defence mode implementation Issue: #1346

Use training library for client memory usage tracking. Use `TfwAdaptiveLimitLock` structure for client memory usage tracking. In defence mode in `tfw_http_conn_recv_finish` callback calculate z-score, compare it with configured `threshold` and drop client connection if necessary (same as we do for non-idempotent requests). Current approach with per-cpu request accounting prevent performance degradation. Pay attention that we also adjust memory usage in per-cpu `mem` storage to check `soft` and `hard` mem limits. We should do it in other storage, because we zero `TfwAdaptiveLimitLock` on the start of the new training and do not account events from previous trainging in `TfwAdaptiveLimitLock`. Performance measurements for the whole patchset were made and show no measurable regression: Training: finished in 50.03s, 1205382.84 req/s, 933.22MB/s finished in 50.03s, 1206352.90 req/s, 935.01MB/s finished in 50.03s, 1212849.66 req/s, 940.37MB/s Defense: finished in 50.03s, 1202041.02 req/s, 931.99MB/s finished in 50.03s, 1221799.64 req/s, 947.31MB/s finished in 50.02s, 1214020.14 req/s, 941.28MB/s Master: finished in 50.03s, 1204474.98 req/s, 932.55MB/s finished in 50.03s, 1214912.74 req/s, 941.36MB/s finished in 50.03s, 1221197.26 req/s, 946.84MB/s Part-of: training/defence mode implementation Issue: #1346

EvgeniiMekhanik requested a review from const-t June 9, 2026 19:11

EvgeniiMekhanik changed the title ~~Mekhanik evgenii/trainging tmp design~~ Trainging mod implementation (WIP) Jun 9, 2026

EvgeniiMekhanik force-pushed the MekhanikEvgenii/trainging-TMP-design branch 15 times, most recently from f418c55 to b86628c Compare June 15, 2026 18:46

const-t reviewed Jun 16, 2026

View reviewed changes

EvgeniiMekhanik force-pushed the MekhanikEvgenii/trainging-TMP-design branch 12 times, most recently from 4b8f8f9 to 96e0ae8 Compare June 22, 2026 11:57

EvgeniiMekhanik force-pushed the MekhanikEvgenii/trainging-TMP-design branch 2 times, most recently from 4681521 to 40ac0a7 Compare June 22, 2026 14:47

EvgeniiMekhanik marked this pull request as draft June 22, 2026 14:48

EvgeniiMekhanik force-pushed the MekhanikEvgenii/trainging-TMP-design branch 3 times, most recently from e48e696 to cd3f102 Compare June 22, 2026 19:15

EvgeniiMekhanik marked this pull request as ready for review June 22, 2026 19:15

EvgeniiMekhanik force-pushed the MekhanikEvgenii/trainging-TMP-design branch from cd3f102 to 5f843e6 Compare June 23, 2026 10:51

EvgeniiMekhanik marked this pull request as draft June 25, 2026 21:22

EvgeniiMekhanik force-pushed the MekhanikEvgenii/trainging-TMP-design branch from 55c3eab to 127ad54 Compare June 26, 2026 11:40

EvgeniiMekhanik marked this pull request as ready for review June 26, 2026 14:44

EvgeniiMekhanik added 6 commits June 26, 2026 22:49

EvgeniiMekhanik force-pushed the MekhanikEvgenii/trainging-TMP-design branch from 127ad54 to 6652af4 Compare June 27, 2026 09:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trainging mod implementation (WIP)#2663

Trainging mod implementation (WIP)#2663
EvgeniiMekhanik wants to merge 6 commits into
masterfrom
MekhanikEvgenii/trainging-TMP-design

EvgeniiMekhanik commented Jun 9, 2026

Uh oh!

const-t left a comment

Uh oh!

const-t Jun 12, 2026

Uh oh!

EvgeniiMekhanik Jun 18, 2026

Uh oh!

const-t Jun 15, 2026

Uh oh!

EvgeniiMekhanik Jun 18, 2026

Uh oh!

const-t Jun 15, 2026

Uh oh!

EvgeniiMekhanik Jun 18, 2026

Uh oh!

const-t Jun 16, 2026

Uh oh!

EvgeniiMekhanik Jun 18, 2026

Uh oh!

const-t Jun 16, 2026

Uh oh!

EvgeniiMekhanik Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

EvgeniiMekhanik commented Jun 9, 2026

Uh oh!

const-t left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants